Search CORE

173 research outputs found

Direction-Based Surrounder Queries for Mobile Recommendations

Author: GAO Yunjun
GUO Xi
ISHIKAWA Yoshiharu
ZHENG Baihua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2011
Field of study

Institutional Knowledge at Singapore Management University

On Efficient Reverse Skyline Query Processing

Author: CHEN Gang
GAO Yunjun
LIU Qing
ZHENG Baihua
Publication venue: 'Elsevier BV'
Publication date: 01/06/2014
Field of study

Institutional Knowledge at Singapore Management University

Continuous Obstructed Nearest Neighbor Queries in Spatial Databases

Author: GAO Yunjun
ZHENG Baihua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

An Efficient Source Model Selection Framework in Model Databases

Author: Chen Lu
Du Yuntao
Gao Yunjun
Yang Keyu
Zhao Minjun
Publication venue
Publication date: 24/11/2021
Field of study

With the explosive increase of big data, training a Machine Learning (ML) model becomes a computation-intensive workload, which would take days or even weeks. Thus, reusing an already trained model has received attention, which is called transfer learning. Transfer learning avoids training a new model from scratch by transferring knowledge from a source task to a target task. Existing transfer learning methods mostly focus on how to improve the performance of the target task through a specific source model, and assume that the source model is given. Although many source models are available, it is difficult for data scientists to select the best source model for the target task manually. Hence, how to efficiently select a suitable source model in a model database for model reuse is an interesting but unsolved problem. In this paper, we propose SMS, an effective, efficient, and flexible source model selection framework. SMS is effective even when the source and target datasets have significantly different data labels, and is flexible to support source models with any type of structure, and is efficient to avoid any training process. For each source model, SMS first vectorizes the samples in the target dataset into soft labels by directly applying this model to the target dataset, then uses Gaussian distributions to fit for clusters of soft labels, and finally measures the distinguishing ability of the source model using Gaussian mixture-based metric. Moreover, we present an improved SMS (I-SMS), which decreases the output number of the source model. I-SMS can significantly reduce the selection time while retaining the selection performance of SMS. Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of SMS

arXiv.org e-Print Archive

SEA: A Scalable Entity Alignment System

Author: Chen Lu
Gao Yunjun
Li Tianyi
Wei Zhiheng
Wu Junyang
Publication venue
Publication date: 01/01/2023
Field of study

Entity alignment (EA) aims to find equivalent entities in different knowledge graphs (KGs). State-of-the-art EA approaches generally use Graph Neural Networks (GNNs) to encode entities. However, most of them train the models and evaluate the results in a fullbatch fashion, which prohibits EA from being scalable on largescale datasets. To enhance the usability of GNN-based EA models in real-world applications, we present SEA, a scalable entity alignment system that enables to (i) train large-scale GNNs for EA, (ii) speed up the normalization and the evaluation process, and (iii) report clear results for users to estimate different models and parameter settings. SEA can be run on a computer with merely one graphic card. Moreover, SEA encompasses six state-of-the-art EA models and provides access for users to quickly establish and evaluate their own models. Thus, SEA allows users to perform EA without being involved in tedious implementations, such as negative sampling and GPU-accelerated evaluation. With SEA, users can gain a clear view of the model performance. In the demonstration, we show that SEA is user-friendly and is of high scalability even on computers with limited computational resources.Comment: SIGIR'23 Demo Trac

arXiv.org e-Print Archive

VBN

Optimal-Location-Selection Query Processing in Spatial Databases

Author: Baihua Zheng
Gencai Chen
Qing Li
Senior Member
Yunjun Gao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract—This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set DA, a target object set DB, a spatial region R, and a critical distance dc in a multidimensional space, an OLS query retrieves those target objects in DB that are outside R but have maximal optimality. Here, the optimality of a target object b 2 DB located outside R is defined as the number of the data objects from DA that are inside R and meanwhile have their distances to b not exceeding dc. When there is a tie, the accumulated distance from the data objects to b serves as the tie breaker, and the one with smaller distance has the better optimality. In this paper, we present the optimality metric, formalize the OLS query, and propose several algorithms for processing OLS queries efficiently. A comprehensive experimental evaluation has been conducted using both real and synthetic data sets to demonstrate the efficiency and effectiveness of the proposed algorithms. Index Terms—Query processing, optimal-location-selection, spatial database, algorithm. Ç

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

On Efficient k-optimal-location-selection Query Processing in Metric Spaces

Author: CHEN Lu
GAO Yunjun
LI Xinhan
QI Shuyao
ZHENG Baihua
Publication venue: 'Elsevier BV'
Publication date: 01/03/2015
Field of study

Institutional Knowledge at Singapore Management University

Statistical Inference of Diffusion Networks

Author: Chen Lu
Gao Yunjun
Huang Hao
Jensen Christian S.
Yan Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

VBN

Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs

Author: Chen Lu
Gao Yunjun
Huang Xingrui
Jensen Christian S.
Zheng Bolong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/12/2020
Field of study

VBN